Imputation of missing genotypes from sparse to high density using long-range phasing.

نویسندگان

  • Hans D Daetwyler
  • George R Wiggans
  • Ben J Hayes
  • John A Woolliams
  • Mike E Goddard
چکیده

Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imputation of Missing Genotypes from Sparse to High 1 Density using Long - Range Phasing 2 3 4

1 Biosciences Research Division, Department of Primary Industries, 1 Park Dr., Bundoora 3083, Australia; 2 The Roslin Institute and R(D)SVS, The University of Edinburgh, Easter Bush, Roslin, EH25 9RG, United Kingdom; 3 Animal Breeding and Genomics Centre, Wageningen University, 6700 AH, Wageningen, The Netherlands; 4 Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, B...

متن کامل

Imputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method

The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...

متن کامل

اهمیت خویشاوندی ژنتیکی و رکورد فنوتیپی بر صحت ژنومی داده‌های جانهی شبیه‌ سازی شده با استفاده از مدل های حیوانی در حضور اثرات متقابل ژنوتیپ و محیط

The objective of this study was to investigate the role of genetic relationships between training and validation set with considering different ratio of phenotypic records of training set on accuracy of genomic prediction via animal models containing genotype × environment interactions in simulated imputation data. For this purpose, four different scenarios using 15k density containing differen...

متن کامل

Genotype imputation for the prediction of genomic breeding values in non-genotyped and low-density genotyped individuals

BACKGROUND There is wide interest in calculating genomic breeding values (GEBVs) in livestock using dense, genome-wide SNP data. The general framework for genomic selection assumes all individuals are genotyped at high-density, which may not be true in practice. Methods to add additional genotypes for individuals not genotyped at high density have the potential to increase GEBV accuracy with li...

متن کامل

ارزیابی صحت پیش‌بینی ژنومی در معماری‌های مختلف ژنومی صفات کمی و آستانه‌ای با جانهی داده‌های ژنومی شبیه‌سازی‌شده، توسط روش جنگل تصادفی

Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetics

دوره 189 1  شماره 

صفحات  -

تاریخ انتشار 2011